Subgroup Simulations

Evaluation of potential for extreme effects

Author

Larry Leon

Code
knitr::opts_chunk$set(echo = TRUE, message=FALSE, warning=FALSE)

options(warn = -1)

rm(list=ls())

library(tinytex)
library(ggplot2)
library(table1)

library(gt)

library(splines)

library(survival)

library(grid)
library(data.table)
library(forestploter)
library(randomizr)

# Source function files
source("KMplotting_functions_v0.R")
source("sim_weibspline_functions_v1.R")
source("sim_subgroup_analyses_v1.R")
source("sim_subgroup_summarytables_v1.R")

1 Summary

We evaluate the potential for extreme subgroup analysis results when conducting many subgroup analyses in survival analysis settings.

Simulations are based on the actual dataset in order to generate scenarios that mimic the trial conditions.

We assume Cox model analyses are the primary analysis where only the treatment arm is included as a covariate (with or without stratification by randomization factors).

In this note we provide a brief review of the Weibull model in Section 2 where we describe a (2-phase) spline model that we utilize for inducing differential treatment effects by way of a ‘’biomarker’‘. Section 3 provides examples of how to generate’‘biomarker profiles’’ of interest where one can define subgroups based on biomarker cutpoints which correspond to underlying causal hazard ratio effects (e.g., causal hazard-ratio effect for biomarker \(\geq 2\) is approximately \(0.5\), say). Section 4 describes the simulation setup (data generating model/process) and generates an example dataset.

Here we use the German Breast Cancer Study Group (gbsg) dataset which is publicly available in the R survival package.

2 Weibull AFT/Cox model with biomarker effects

2.1 Brief Review

For the Weibull distribution with shape parameter \(\nu\) and scale parameter \(\theta\) the density, cdf, survival, hazard and cumulative hazard functions are: \[\begin{eqnarray*} f(t) &=& \nu \theta^{-\nu}t^{\nu-1}\exp(-(t/\theta)^{\nu}), \cr F(t) &=& \int_{0}^{t}\nu \theta^{-\nu}s^{\nu-1}\exp(-(s/\theta)^{\nu})ds \cr &=& \int_{0}^{(t/\theta)^{\nu}}e^{-w}dw \cr & & (w=(s/\theta)^{\nu},dw=\nu s^{\nu-1}\theta^{-\nu}ds) \cr &=&1-\exp(-(t/\theta)^{\nu}), \cr S(t) &=& \exp(-(t/\theta)^{\nu}), \cr \lambda(t)&=&\nu \theta^{-\nu}t^{\nu-1}, \cr \Lambda(t) &=& -\log(S(t))=(t/\theta)^{\nu}. \end{eqnarray*}\] Note that here we define the density to correspond with R’s definition. For shape parameter \(\nu \in (0,1)\) the hazard is strictly decreasing in \(t \geq 0\), whereas for \(\nu >1\) the hazard is strictly increasing in \(t \geq 0\).

Note: \(\Lambda(T) \sim E(1)\)

The cumulative hazard function \(\Lambda(\cdot)\) evaluated at \(T\), \(\Lambda(T)\), as a random variable has cdf \[\eqalign{ \Pr(\Lambda(T) \leq t) &=\Pr(-\log(1-F(T)) \leq t) =\Pr(1-F(T) \geq e^{-t}) \cr &=\Pr(T \leq F^{-1}(1-e^{-t})) =F(F^{-1}(1-e^{-t})) = 1-e^{-t}, \cr}\] which is the CDF of the exponential distribution, \(E(1)\) (say).

In the following we use

\[\begin{equation} \tag{1} \Lambda(T) \sim E(1) \end{equation}\] to represent the Weibull regression model as an AFT model which is also a Cox model.

Now, \(\Lambda(T)=(T/\theta)^{\nu}\) and write \(Q=-\log(S(T))=\Lambda(T)=(T/\theta)^{\nu}\), where from (1), \(Q \sim E(1)\). That is \(\log(Q)=\nu(\log(T)-\log(\theta))\) can be expressed as

\[\begin{equation} \tag{2} \log(T)=\log(\theta)+ \tau \log(Q) := \log(\theta) + \tau \epsilon, \end{equation}\] where \(\tau=1/\nu\) and it is easy to show that \(\epsilon=\log(Q)\) has the ``extreme value’’ distribution with density \(f_{\epsilon}(x)=\exp(x-e^{x})\) for \(x \in {\cal R}\). Here the range of \(\log(T) \in {\cal R}\) is un-restricted. The survReg routine uses the parameterization \((2)\) and therefore estimates \(\log(\theta)\) and \(\tau=1/\nu\).

To incorporate covariates \(L\) (say) we specify \[\lambda(t;L)=\big(\nu \theta^{-\nu}t^{\nu-1} \big) \exp(L'\beta) := \lambda_{0}(t)\exp(L'\beta),\] where \(\lambda_{0}(t)\) is the hazard, say, for \(T_{0} \sim \hbox{Weibull}(\nu,\theta)\). This is a special case of the proportional hazards model. The chf (conditional chf with covariate vector \(L\)) is \(\Lambda(t;Z)=(t/\theta)^{\nu}\exp(L'\beta)\) so that analogous to above this leads to the representation

\[\begin{equation} \tag{3} \log(T) =\log(\theta)+\tau[-L'\beta+\epsilon] =\log(\theta)+L'\gamma + \tau \epsilon, \end{equation}\] where \(\gamma=-\tau\beta\), with \(\tau\) and \(\epsilon\) defined in (2). R survReg uses this AFT parameterization so that the estimated components of \(\gamma\), \(\gamma_{p}\) say, are that of \(-\tau\beta_{p}\) for \(p=1,\ldots,m\) (\(m\) is dimension of \(\beta\)).

When fitting the AFT model (3) via suvreg we therefore transform parameters \(\hat\gamma\) to the Weibull hazard-ratio parameterization (2) via

\[\begin{equation} \tag{4} \hat\beta = -\hat\gamma / \hat{\tau}. \end{equation}\]

As an illustration we compare the survReg model fits for the case-study dataset. The following table below compares the Weibull survReg model fits with covariates Treat (hormon) and menopausal status (meno = 1 vs 0) where components of \(\hat\gamma\) from model (3) are calculated according to survReg and \(\hat\beta\) are formed via (4). In the table below Weibull estimates of \(\hat\beta\) are compared to Cox model versions.

Code
# case-study example
df.case <- gbsg
names(df.case)
 [1] "pid"     "age"     "meno"    "size"    "grade"   "nodes"   "pgr"    
 [8] "er"      "hormon"  "rfstime" "status" 
Code
# Comparing Weibull vs Cox with case-study 
# This is just for illustration to show conversion of Weibull parameters from 
# AFT regression to Weibull hazard 
fit.weib_ex <- survreg(Surv(rfstime,status) ~ hormon + meno, dist='weibull', data=df.case)
tauhat <- fit.weib_ex$scale
# convert (treat,ecog1) regression parms to Weibull hazard-ratio
bhat.weib <- -(1)*coef(fit.weib_ex)[c(2,3)]/tauhat
# Compare to Cox 
fit.cox_ex <- coxph(Surv(rfstime,status) ~ hormon + meno, data=df.case)
res <- cbind(bhat.weib,coef(fit.cox_ex))
res <- as.data.frame(res)
colnames(res)<-c("Weibull","Cox")
res |> gt() |>
fmt_number(columns=1:2,decimals=6) |>
tab_header(title="Comparing Weibull to Cox hazard ratio estimates",
subtitle="Case-study dataset")
Comparing Weibull to Cox hazard ratio estimates
Case-study dataset
Weibull Cox
−0.432739 −0.400314
0.171343 0.152596

We will use estrogen receptors (er) as a biomarker where we will induce true biomarker treatment effects as described below. First let’s see whether er appears influential by looking at a spline model which will be the basis for generating biomarker effects.

Code
# Convert rfstime to months
# Suppose stratum is grade
# Biomarker is log(er+1)
df.case <- within(df.case,{
  treat <- hormon
  event <- status
  tte <- rfstime/30.4375
  z <- log(er+1)
  stratum <- grade
  # consider z <= log(2) as "biomarker low"
  bm_low <- ifelse(z <= log(2), 1,0)
  })
Code
table1(~ age + factor(bm_low) + factor(meno) + size + factor(grade) + nodes + pgr + er | hormon, data=df.case)
0
(N=440)
1
(N=246)
Overall
(N=686)
age
Mean (SD) 51.1 (9.96) 56.6 (9.41) 53.1 (10.1)
Median [Min, Max] 50.0 [21.0, 80.0] 58.0 [32.0, 80.0] 53.0 [21.0, 80.0]
factor(bm_low)
0 373 (84.8%) 215 (87.4%) 588 (85.7%)
1 67 (15.2%) 31 (12.6%) 98 (14.3%)
factor(meno)
0 231 (52.5%) 59 (24.0%) 290 (42.3%)
1 209 (47.5%) 187 (76.0%) 396 (57.7%)
size
Mean (SD) 29.6 (14.4) 28.8 (14.1) 29.3 (14.3)
Median [Min, Max] 25.0 [3.00, 120] 25.0 [4.00, 100] 25.0 [3.00, 120]
factor(grade)
1 48 (10.9%) 33 (13.4%) 81 (11.8%)
2 281 (63.9%) 163 (66.3%) 444 (64.7%)
3 111 (25.2%) 50 (20.3%) 161 (23.5%)
nodes
Mean (SD) 4.94 (5.56) 5.13 (5.33) 5.01 (5.48)
Median [Min, Max] 3.00 [1.00, 51.0] 3.00 [1.00, 36.0] 3.00 [1.00, 51.0]
pgr
Mean (SD) 102 (170) 124 (250) 110 (202)
Median [Min, Max] 32.0 [0, 1600] 35.0 [0, 2380] 32.5 [0, 2380]
er
Mean (SD) 79.7 (124) 126 (191) 96.3 (153)
Median [Min, Max] 32.0 [0, 898] 46.0 [0, 1140] 36.0 [0, 1140]
Code
# non-stratified
kmH.fit<-KM.plot.2sample.weighted(df=df.case, 
tte.name="tte", event.name="event", treat.name="treat",
risk.set=TRUE, by.risk=6,
details=TRUE,
Xlab="Months",Ylab="Recurrence-free survival",
arms=c("Experimental","Control"))
Cox un-adjusted HR= 0.695 
Cox CIs= 0.544 0.888 
My p-value and survdiff= 0.003427282 0.003427282 
My z^2 and survdiff= 8.564781 8.564781 
Comparing with R's survfit 
Treat=1: Max error max|KM(survfit)[t]-KM(mine)[t]| = 0 
Treat=0: Max error max|KM(survfit)[t]-KM(mine)[t]| = 0 
Comparing with R's survfit (experimental 
Treat=1: n,events= 246 94 
Median, Lower, Upper= 66.29979 63.01437 Inf 
survfit medians 
  records     n.max   n.start    events     rmean se(rmean)    median   0.95LCL 
   246.00    246.00    246.00     94.00     59.68      2.15     66.30     63.01 
  0.95UCL 
       NA 
Comparing with R's survfit (Control 
Treat=0: n,events= 440 205 
Median, Lower, Upper= 50.20123 42.57906 59.59754 
survfit medians 
  records     n.max   n.start    events     rmean se(rmean)    median   0.95LCL 
   440.00    440.00    440.00    205.00     50.51      1.62     50.20     42.58 
  0.95UCL 
    59.60 
Code
title(main="ITT Population")

Code
# ydel (default=0.25) is how much room to provide in space for counts
# Note that byz=0.25 plots fits by units of 0.25 for biomarker
# zwindow=0.25 creates counts of subjects with biomarker within 0.25 units of z
# That is, for z=0.5 there were 16 subjects within (0.25,0.75)
temp <- cox_cs_fit2(df=df.case,tte.name="tte",event.name="event",treat.name="treat",
strata.name="stratum", z.name=c("z"), details=FALSE ,boots=0, maxz=8, xlab=c("Biomarker"), byz=0.25, zwindow=0.25, ydel=0.5, show_plot=TRUE,truebeta.name=NULL)
title("ITT")

There is some indication that lower biomarker values may have reduced efficacy.

In the following we will describe a two-phase spline model that can generate such profiles.

Code
# z <= 1.0
# non-stratified
kmH.fit<-KM.plot.2sample.weighted(df=subset(df.case,z<=1), 
tte.name="tte", event.name="event", treat.name="treat",
risk.set=TRUE, by.risk=6,
details=TRUE,
Xlab="Months",Ylab="Recurrence-free survival",
arms=c("Experimental","Control"))
Cox un-adjusted HR= 1.238 
Cox CIs= 0.7 2.192 
My p-value and survdiff= 0.4621491 0.4621491 
My z^2 and survdiff= 0.5406846 0.5406846 
Comparing with R's survfit 
Treat=1: Max error max|KM(survfit)[t]-KM(mine)[t]| = 0 
Treat=0: Max error max|KM(survfit)[t]-KM(mine)[t]| = 0 
Comparing with R's survfit (experimental 
Treat=1: n,events= 31 17 
Median, Lower, Upper= 23.72074 18.00411 Inf 
survfit medians 
  records     n.max   n.start    events     rmean se(rmean)    median   0.95LCL 
    31.00     31.00     31.00     17.00     36.73      5.27     23.72     18.00 
  0.95UCL 
       NA 
Comparing with R's survfit (Control 
Treat=0: n,events= 67 39 
Median, Lower, Upper= 39.16222 27.66324 Inf 
survfit medians 
  records     n.max   n.start    events     rmean se(rmean)    median   0.95LCL 
    67.00     67.00     67.00     39.00     45.96      3.89     39.16     27.66 
  0.95UCL 
       NA 
Code
title(main="Biomarker <= 1 Population")

2.2 Biomarker effects with spline model

We now outline how potential outcomes are simulated according to parameters fit to the case-study dataset but with parameters specified to induce biomarker effects. That is, causal treatment effects (on log(hazard-ratio) scale) that follow a spline model according to patterns where biomarker effects increase with biomarker levels; Including various degrees of limited treatment effects for low biomarker levels.

We first consider a Weibull model with treatment and a single biomarker covariate \(Z\) where we write the linear predictor of the Cox model \(L'\beta\) (say) as

\[\begin{equation} \tag{5} L'\beta := \beta_{1}\hbox{Treat} + \beta_{2}\hbox{Z} + \beta_{3}\hbox{Z}\hbox{Treat} + \beta_{4}(\hbox{Z}-k)I(\hbox{Z}>k) + \beta_{5}(\hbox{Z}-k)I(\hbox{Z}>k)\hbox{Treat}. \end{equation}\]

Following the potential-outcome approach let \(l_{x,z}\) denote subject’s hazard-function “had they followed treatment regimen \(Treat=x\) while having biomarker level \(Z=z\)”. That is, for subject with biomarker level \(Z=z\) we can simulate their survival outcomes under both treatment (\(x=1\)) and control (\(x=0\)) conditions. Let \(\beta^{0} = (\beta_{1}^{0},\ldots,\beta_{5}^{0})'\) denote the true coefficients and denote the hazard function as

\[\lambda_{x,z}(t) = \lambda_{0}(t)\exp(l_{x,z}), \quad \hbox{say}.\]

Writing

\[\begin{equation} l_{x,z} = \beta^{0}_{1}x + \beta^{0}_{2}z + \beta^{0}_{3}zx + \beta^{0}_{4}(z-k)I(z>k) + \beta^{0}_{5}(z-k)I(z>k)x, \end{equation}\] the log of the hazard ratio for biomarker level \(z\) under treatment (\(x=1\)) relative to control (\(x=0\)) is given by

\[\begin{equation} \tag{6} \psi^{0}(z) := \log(\lambda_{1,z}(t)/\lambda_{0,z}(t)) = \beta^{0}_{1} + \beta^{0}_{3}z + \beta^{0}_{5}(z-k)I(z>k). \end{equation}\]

2.3 Causal log-hazard-ratio

The log(hazard-ratio) for biomarker level \(z\) is a linear function of \(z\) with a change-point (in slope) at \(z=k\) given by

\[\begin{eqnarray*} \psi^{0}(z) &=& \beta^{0}_{1} + \beta^{0}_{3}z, \quad \hbox{for} \ z \leq k, \cr &=& \beta^{0}_{1} + \beta^{0}_{3}z + \beta^{0}_{5}(z-k), \quad \hbox{for} \ z > k. \end{eqnarray*}\]

Log hazard-ratio parameters \((\beta^{0}_{1},\beta^{0}_{3},\beta^{0}_{5})\) can be chosen to generate “treatment effect patterns” by specifying \(\psi^{0}(z)\) values at \(z=0\), \(z=k\), and \(z=\zeta\) for \(\zeta > k\). For specified \(\psi^{0}(0)\), \(\psi^{0}(k)\), and \(\psi^{0}(\zeta)\) we have

\[\begin{eqnarray} \beta^{0}_{1} &=& \psi^{0}(0), \cr \beta^{0}_{3} &=& {(\psi^{0}(k) - \beta^{0}_{1}) \over k}, \cr \beta^{0}_{5} &=& {(\psi^{0}(\zeta) - \beta^{0}_{1} - \beta^{0}_{3}\zeta) \over (\zeta -k)}. \end{eqnarray}\]

The function get_dgm_stratified generates \(\psi^{0}(z)\) according to desired “biomarker treatment effect patterns” as follows.

  • Let \(X\) and \(Z\) denote the treatment and biomarker variables in the case-study dataset and for specified \(k\), form the covariates \(L:=(X,Z,ZX,(Z-k)I(Z>k),(Z-k)I(Z>k)X))\);
  • Fit the Weibull model (recall on AFT scale) to get \(\log(\hat\theta)\), \(\hat\tau=1/\hat{\nu}\), and \(\hat\gamma\) corresponding to \(L\);
  • \(\hat\gamma\) is in terms of the AFT parameterization given by model (3)
  • Next transform to the Weibull (Cox) log hazard-ratio parameterization (4): \(\hat\beta = -\hat\gamma/\hat\tau\)
  • Set “true” parameters \(\theta^{0}=\hat\theta\), and \(\tau^{0}=\hat\tau\)
  • Initialize the “true” parameter \(\beta^{0} = \hat\beta\) and re-define parameters 1, 3, and 5 in order to satisfy specified \(\psi^{0}(0)\), \(\psi^{0}(k)\), and \(\psi^{0}(\zeta)\): \(\beta^{0}[1] = \psi^{0}(0)\), \(\beta^{0}[3] = (\psi^{0}(k) - \beta^{0}[1])/k\), and \(\beta^{0}[5] = (\psi^{0}(\zeta) - \beta^{0}[1] - \beta^{0}[3]\zeta)/(\zeta -k)\);
  • Form corresponding \(\gamma^{0}= -\beta^{0}\tau^{0}\)
  • For simulations we use the AFT parameterization (3) to generate \(\log(T)\) outcomes according to \(\log(T) = \log(\theta^{0}) + L'\gamma^{0} + \tau^{0}\epsilon\) where recall \(\epsilon\) has the ``extreme value’’ distribution.

The inputs of the get_dgm_stratified function are:

  • The case-study dataset (“df”)
  • “knot”=\(k\), “zeta”=\(\zeta\), and “log.hrs”= (\(\psi^{0}(0),\psi^{0}(k),\psi^{0}(\zeta))\)
  • Note that get_dgm_stratified allows for outcomes to follow a stratified Weibull (Cox) models in which case the log(hazard-ratio) effects will depend on the strata, “strata_tte”, where the default is non-stratified (“strata_tte=NULL”)
  • If stratified the \(\tau\) parameters and hence \(\gamma^{0}\) depend on the strata
  • If stratified, then to simplify, the \(\gamma^{0}\) effects are calculated based on the median of the \(\tau^{0}\)’s (\(\tau^{0}=\tau_{med}\), say).

To-do –> explain outputs and how used in draw_sim_stratified

2.4 Example where treatment effects increase with increasing biomarker

We assume that \(z=0\) indicates biomarker negative values with \(z>1\) indicating positive levels. As an example, suppose the log(hazard ratio) at \(z=0\) is \(\psi^{0}(0)=\log(3)\) and decreases linearly for \(z \leq 5\) (with slope \(\beta_{3}^{0}\)) such that at \(z=5\), \(\psi^{0}(5)=\log(1.25)\) and for \(z>5\) decreases linearly (with a change in slope and intercept) such that at \(z=10\), \(\psi^{0}(10) = \log(0.5)\). In the following we will describe summary measures of the treatment effects as a function of increasing [or decreasing] values of the biomarker. In this example treatment is detrimental for lower values of the biomarker with treatment effects increasing fairly quickly as the biomarker increases with an overall ``average hazard ratio’’ (AHR) of \(\approx 0.74\) (see Working Example below).

2.5 Including prognostic factors \(W\)

We extend the model to include a baseline prognostic factor \(W\) within \(L\) (effect parameter \(\beta^{0}_{w}\)) where

\[\begin{equation} l_{x,z,w} = \beta^{0}_{1}x + \beta^{0}_{2}z + \beta^{0}_{3}zx + \beta^{0}_{4}(z-k)I(z>k) + \beta^{0}_{5}(z-k)I(z>k)x + \beta^{0}_{w}w. \end{equation}\]

Note that defining \(m(1,z,w)\) [\(m(0,z,w)\))] as the conditional expected value of \(\log(T)\) with treatment set to experimental [control] given biomarker level \(Z=z\) and \(W=w\)

\[\psi^{0}(z,w) = \{m(0,z,w)-m(1,z,w)\}/\tau\] which corresponds to the difference in the (true) means of the potential outcomes under experimental and control conditions (\(\log(T[0,z,w]\) versus \(\log(T[1,z,w]\), say)

3 Specifying biomarker treatment effects

3.1 Average hazard ratios (AHRs)

We define the biomarker average hazard ratio (AHR) as the expected value of \(\psi^{0}(\cdot)\) across “increasing biomarker” sub-populations. For example, \(\hbox{AHR}(2^{+})\) represents the AHR for subjects with biomarker values \(\geq 2\) via

\[\begin{equation} \tag{6} \hbox{AHR}(z^{+}):= \exp\left\{E_{Z \geq z} \psi^{0}(Z) \right\}; \end{equation}\]

With the opposite direction – across “decreasing biomarker” sub-populations

\[\begin{equation} \tag{7} \hbox{AHR}(z^{-}):= \exp\left\{E_{Z \leq z} \psi^{0}(Z) \right\}. \end{equation}\]

Here the expectations are over the distribution of \(Z\), however if there is a prognostic factor \(W\) then the expectations are over \((Z,W)\). In our calculations we calculate empirical averages.

3.2 Controlled direct-effect (CDE) versions (Aalen, Cook, and Røysland (2015))

Averaging across hazard ratios: Define the hazard (omitting baseline hazard) \(\theta^{x}(z,w) = \exp(l(x,z,w))\) setting treatment to \(X=x\) given \(Z=z\) and \(W=w\). Aalen et al. (Aalen, Cook, and Røysland (2015)) define the controlled direct-effect (CDE) as the ratio of the expected hazards. Here we consider the above cumulative versions (omitting possible dependence on \(W\)). Let \(\bar\theta^{x}(z+) = E_{Z \geq z}\theta^{x}(Z)\) for \(x=0,1\), and define

\[\hbox{CDE}(z^{+}):= \bar\theta^{1}(z+)/\bar\theta^{0}(z+), \quad \hbox{and} \quad \hbox{CDE}(z^{-}):= \bar\theta^{1}(z-)/\bar\theta^{0}(z-)\]

3.3 Compare ‘AHR’ versions under various biomarker specifications

3.3.1 Under PH (uniform effects, hr=0.7), with strong prognostic effect \(\beta_{w}= -\log(5)\)

Code
# Outcome model is NON-stratified (strata_tte=NULL)
# PH 

log.hrs <- log(c(0.6,0.6,0.6))

dgm <- get_dgm_stratified(df=df.case,zeta=10,knot=5,log.hrs=log.hrs,strata_tte=NULL,details=FALSE)

# Simulate with randomization factors "stratum" 
# wname="meno" specifies prognostic effect factor
# bw=log(5) denotes log(hazard ratio) relative effect with respect to wname factor

# Draw very large-sample to get approximation to bias
# Note: return_df will NOT return the large simulated dataset but the "large-sample" estimates as a list
# checking=TRUE will compare the estimated tau (stratified) parameters based on the case-study dataset to 
# the versions based on the simulated dataset
# details=TRUE will output the approximations to Cox model estimators from several models ("population summaries")
# return_df =FALSE will return the "population summaries" as a list as well as the biomarker AHR functional (see below)

pop_summary1 <- draw_sim_stratified(dgm=dgm,ss=1,Ndraw=10000,wname="meno",bw=-log(5),strata_rand="stratum",checking=TRUE,details=TRUE,return_df=FALSE)
% censored = 0.7638 
Stratification parm (taus) df_super 0.7663219 
Stratification parm (taus) simulated= 0.7682139 
Max |loghr.po - (log.Y0-log.Y1)/tau| =  0 
Overall empirical AHR, CDE= 0.6 0.6 
AHR W=1, W=0 0.6 0.6 
CDE W=1, W=0 0.6 0.6 
Cox ITT: Un-adjusted, sR, W 0.6184831 0.6179322 0.5869613 
Cox W=1 Sub-population 0.5499276 
Cox W=0 Sub-population 0.6003528 

Code
plot_AHRs(popsummary=pop_summary1,dfcase=df.case)

3.3.2 Under ‘’moderate’’ biomarker effects

Code
# Moderate biomarker
log.hrs <- log(c(0.9,0.8,0.65))
dgm <- get_dgm_stratified(df=df.case,zeta=10,knot=5,log.hrs=log.hrs,strata_tte=NULL,details=FALSE)
# Strong prognostic factor 
pop_summary5 <- draw_sim_stratified(dgm=dgm,ss=1,Ndraw=10000,wname="meno",bw=-log(5),strata_rand="stratum",checking=TRUE,details=TRUE,return_df=FALSE)
% censored = 0.734 
Stratification parm (taus) df_super 0.7663219 
Stratification parm (taus) simulated= 0.7633213 
Max |loghr.po - (log.Y0-log.Y1)/tau| =  0 
Overall empirical AHR, CDE= 0.8299495 0.8437643 
AHR W=1, W=0 0.8229195 0.8397564 
CDE W=1, W=0 0.8327789 0.8465978 
Cox ITT: Un-adjusted, sR, W 0.8518947 0.8521851 0.8341693 
Cox W=1 Sub-population 0.7413288 
Cox W=0 Sub-population 0.8690962 

Code
plot_AHRs(popsummary=pop_summary5,dfcase=df.case)

3.4 Calculating causal ‘AHR’ effects for subgroups

To evaluate the estimation properties of our procedures we calculate the empirical average within subgroup \(\cal G\) (say) which can represent the estimated subgroup or can be the ITT population (or any underlying subgroup). That is, for each subject with characteristics \({\bf L}=(Z,W)\) say, we have their individual log hazard-ratio contrast \(\psi^{0}({\bf L})\); We estimate \(\hbox{AHR}({\cal G}):= \exp\left\{E_{\cal G} \psi^{0}\right\}\) by calculating the empirical average

\[\begin{equation} \tag{8} \hbox{AHR}({\cal G}):= \exp(\bar\beta({\cal G})), \quad \hbox{where} \quad \bar\beta({\cal G}) =(1/n({\cal G}))\sum_{g \in {\cal G}} \psi^{0}({\bf L}_{g}), \end{equation}\] with \(n({\cal G})\) denoting the size of \({\cal G}\).

Note that for now we will not incorporate this … since we will first consider the scenario of a uniform treatment effect

But can be of interest when we want to calculate bias when subgroup effects are induced either directly or via indirectly via correlation with biomarker (when biomarker effect exists)

Code
# Outcome model is NON-stratified (strata_tte=NULL)
# PH 
log.hrs <- log(c(0.6,0.6,0.6))
dgm <- get_dgm_stratified(df=df.case,zeta=10,knot=5,log.hrs=log.hrs,strata_tte=NULL,details=FALSE)

Lets check if there is any imbalance (asymptotically) by biomarker by drawing a large sample (N=10,000) and viewing summary tables:

Code
# This time return large dataset (corresponding to that used in the above asymptotic approximations)
dflarge <- draw_sim_stratified(dgm=dgm,ss=1,Ndraw=10000,wname="meno",bw=-log(5),strata_rand="stratum",checking=FALSE,details=FALSE,return_df=TRUE)
# create factor versions of treatment and AP region
dflarge$trtsim <- ifelse(dflarge$treat.sim==1,"Experimental","Control")
dflarge$menopausal <- ifelse(dflarge$meno==1,"post-meno","pre-meno")
dflarge$bm_log2 <- ifelse(dflarge$z<=log(2),"biomarker<=log(2)","biomarker>log(2)")
table1( ~ z + bm_log2 | trtsim, data=dflarge, caption=c("ITT by treatment arm"))
ITT by treatment arm
Control
(N=5000)
Experimental
(N=5000)
Overall
(N=10000)
z
Mean (SD) 3.32 (1.86) 3.36 (1.83) 3.34 (1.85)
Median [Min, Max] 3.60 [0, 7.04] 3.61 [0, 7.04] 3.61 [0, 7.04]
bm_log2
biomarker<=log(2) 742 (14.8%) 707 (14.1%) 1449 (14.5%)
biomarker>log(2) 4258 (85.2%) 4293 (85.9%) 8551 (85.5%)
Code
table1( ~ z + bm_log2 | menopausal, data=dflarge, caption=c("ITT by menopausal status"))
ITT by menopausal status
post-meno
(N=5800)
pre-meno
(N=4200)
Overall
(N=10000)
z
Mean (SD) 3.64 (1.93) 2.91 (1.64) 3.34 (1.85)
Median [Min, Max] 4.09 [0, 7.04] 3.22 [0, 6.44] 3.61 [0, 7.04]
bm_log2
biomarker<=log(2) 751 (12.9%) 698 (16.6%) 1449 (14.5%)
biomarker>log(2) 5049 (87.1%) 3502 (83.4%) 8551 (85.5%)
Code
table1( ~ z + bm_log2 | trtsim, data=subset(dflarge,meno==1), caption=c("Post-menopausal  by treatment arm"))
Post-menopausal by treatment arm
Control
(N=2905)
Experimental
(N=2895)
Overall
(N=5800)
z
Mean (SD) 3.61 (1.95) 3.67 (1.91) 3.64 (1.93)
Median [Min, Max] 4.06 [0, 7.04] 4.14 [0, 7.04] 4.09 [0, 7.04]
bm_log2
biomarker<=log(2) 393 (13.5%) 358 (12.4%) 751 (12.9%)
biomarker>log(2) 2512 (86.5%) 2537 (87.6%) 5049 (87.1%)
Code
rm("dflarge")

Next, a simulated example with same sample size as the case-study but randomized 1:1 (note, can retain original randomization with keep_rand=TRUE … shown next)

Code
df_example <- draw_sim_stratified(dgm=dgm,ss=1,wname="meno",bw=-log(5),strata_rand="stratum",checking=FALSE,details=FALSE,return_df=TRUE)
# non-stratified
kmH.fit<-KM.plot.2sample.weighted(df=df_example, 
tte.name="y.sim", event.name="event.sim", treat.name="treat.sim",
risk.set=TRUE, by.risk=6,
details=TRUE,
Xlab="Months",Ylab="Recurrence-free survival",
arms=c("Experimental","Control"))
Cox un-adjusted HR= 0.477 
Cox CIs= 0.336 0.677 
My p-value and survdiff= 2.318984e-05 2.318984e-05 
My z^2 and survdiff= 17.90756 17.90756 
Comparing with R's survfit 
Treat=1: Max error max|KM(survfit)[t]-KM(mine)[t]| = 0 
Treat=0: Max error max|KM(survfit)[t]-KM(mine)[t]| = 0 
Comparing with R's survfit (experimental 
Treat=1: n,events= 342 48 
Median, Lower, Upper= Inf Inf Inf 
survfit medians 
  records     n.max   n.start    events     rmean se(rmean)    median   0.95LCL 
   342.00    342.00    342.00     48.00    102.00      3.31        NA        NA 
  0.95UCL 
       NA 
Comparing with R's survfit (Control 
Treat=0: n,events= 344 90 
Median, Lower, Upper= Inf 76.46464 Inf 
survfit medians 
  records     n.max   n.start    events     rmean se(rmean)    median   0.95LCL 
   344.00    344.00    344.00     90.00     82.93      2.95        NA     76.46 
  0.95UCL 
       NA 
Code
title(main="ITT Population")

Retain original randomization

Code
df_example <- draw_sim_stratified(dgm=dgm,ss=1,wname="meno",bw=-log(5),strata_rand="stratum",checking=FALSE,details=FALSE,return_df=TRUE,keep_rand=TRUE)
# non-stratified
kmH.fit<-KM.plot.2sample.weighted(df=df_example, 
tte.name="y.sim", event.name="event.sim", treat.name="treat.sim",
risk.set=TRUE, by.risk=6,
details=TRUE,
Xlab="Months",Ylab="Recurrence-free survival",
arms=c("Experimental","Control"))
Cox un-adjusted HR= 0.395 
Cox CIs= 0.269 0.579 
My p-value and survdiff= 7.968842e-07 7.968842e-07 
My z^2 and survdiff= 24.36537 24.36537 
Comparing with R's survfit 
Treat=1: Max error max|KM(survfit)[t]-KM(mine)[t]| = 0 
Treat=0: Max error max|KM(survfit)[t]-KM(mine)[t]| = 0 
Comparing with R's survfit (experimental 
Treat=1: n,events= 246 33 
Median, Lower, Upper= Inf Inf Inf 
survfit medians 
  records     n.max   n.start    events     rmean se(rmean)    median   0.95LCL 
   246.00    246.00    246.00     33.00     98.24      2.74        NA        NA 
  0.95UCL 
       NA 
Comparing with R's survfit (Control 
Treat=0: n,events= 440 131 
Median, Lower, Upper= 82.308 73.76067 Inf 
survfit medians 
  records     n.max   n.start    events     rmean se(rmean)    median   0.95LCL 
   440.00    440.00    440.00    131.00     77.29      3.68     82.31     73.76 
  0.95UCL 
       NA 
Code
title(main="ITT Population")

3.5 Looking at 9 simulated AP trials when AP region is a strong factor (\(\beta_{w} = -\log(5)\)) and prognostic null (\(\beta_{w}=0\))

3.5.1 Nine simulated AP realizations under \(\beta_{w}= -\log(5)\)

Code
par(mfrow=c(3,3))

for(ss in 1:9){
df_ex <- draw_sim_stratified(dgm=dgm,ss=ss,wname="meno",bw=-log(5),strata_rand="stratum")
kmH.fit<-KM.plot.2sample.weighted(df=subset(df_ex,meno==1), 
tte.name="y.sim", event.name="event.sim", treat.name="treat.sim",strata.name="strata.simR",
risk.set=TRUE, by.risk=12, details=FALSE, show.med=FALSE, cox.cex= 1.25,
show.logrank=FALSE, put.legend.arms="top",
Xlab="Months",Ylab="Survival",
arms=c("Experimental","Control"))
title(main="Post-menopausal subgroup")
}

3.5.2 Nine simulated AP realizations under \(\beta_{w} = 0\)

Code
par(mfrow=c(3,3))

for(ss in 1:9){
df_ex <- draw_sim_stratified(dgm=dgm,ss=ss,wname="meno",bw=-log(1),strata_rand="stratum")
kmH.fit<-KM.plot.2sample.weighted(df=subset(df_ex,meno==1), 
tte.name="y.sim", event.name="event.sim", treat.name="treat.sim",strata.name="strata.simR",
risk.set=TRUE, by.risk=12, details=FALSE,show.med=FALSE, cox.cex= 1.25,
show.logrank=FALSE, put.legend.arms="top",
Xlab="Months",Ylab="Survival",
arms=c("Experimental","Control"))
title(main="AP Population")
}

3.6 Compare non-parametric (cubic-spline) fit with true \(\psi^{0}\)

For ITT dataset

Note that under uniform effect this should fluctuate around truth (0.6 here)

Code
# ydel (default=0.25) is how much room to provide in space for counts
temp <- cox_cs_fit2(df=df_example,tte.name="y.sim",event.name="event.sim",treat.name="treat.sim",
strata.name="strata.simO",z.name=c("z"), details=FALSE ,boots=0, xlab=c("Biomarker"),maxz = 8, ydel=0.5, byz=0.25, zwindow=0.25, show_plot=TRUE,truebeta.name="loghr.po")
title("ITT")

Within meno subgroups

Code
par(mfrow=c(1,2))

temp <- cox_cs_fit2(df=subset(df_example,meno==0),tte.name="y.sim",event.name="event.sim",treat.name="treat.sim",
strata.name="strata.simO",z.name=c("z"), details=FALSE ,boots=0, xlab=c("Biomarker"),maxz = 8, ydel=0.5, byz=0.25, zwindow=0.25, show_plot=TRUE,truebeta.name="loghr.po")
title("Pre-meno")


temp <- cox_cs_fit2(df=subset(df_example,meno==1),tte.name="y.sim",event.name="event.sim",treat.name="treat.sim",
strata.name="strata.simO",z.name=c("z"), details=FALSE ,boots=0, xlab=c("Biomarker"),maxz = 8, ydel=0.5, byz=0.25, zwindow=0.25, show_plot=TRUE,truebeta.name="loghr.po")
title("Post-meno")

3.7 Many subgroup analyses under uniform treatment effects (hr=0.7, and NO subgroup effects)

What can happen?

Setup subgroups and artificially generate small subgroups

Code
set.seed(8316951)
# Include subgroups with specific sample sizes via randomly sampling
sg_ran <- rbinom(nrow(df.case),size=1,prob=0.2)
# Take 1st 15
temp <- cumsum(sg_ran)
sg_ran15 <- sg_ran
sg_ran15[which(temp>15)] <- 0
cat("# of subjects in sg_ran15",c(sum(sg_ran15)),"\n")
# of subjects in sg_ran15 15 
Code
df.case[,"random15"] <- sg_ran15

sg_ran20 <- sg_ran
# Take 1st 20
sg_ran20[which(temp>20)] <- 0
cat("# of subjects in sg_ran20",c(sum(sg_ran20)),"\n")
# of subjects in sg_ran20 20 
Code
df.case[,"random20"] <- sg_ran20

sg_ran40 <- sg_ran
# Take 1st 40
sg_ran40[which(temp>40)] <- 0
cat("# of subjects in sg_ran40",c(sum(sg_ran40)),"\n")
# of subjects in sg_ran40 40 
Code
df.case[,"random40"] <- sg_ran40

sg_ran60 <- sg_ran
# Take 1st 60
sg_ran60[which(temp>60)] <- 0
cat("# of subjects in sg_ran60",c(sum(sg_ran60)),"\n")
# of subjects in sg_ran60 60 
Code
df.case[,"random60"] <- sg_ran60
rm("temp")

Create an artificial country name

Code
set.seed(8316951)
# Include subgroups with specific sample sizes via randomly sampling
country_ran <- rbinom(nrow(df.case),size=1,prob=0.1)
df.case[,"country"] <- ifelse(country_ran==1,"Somewhere","Elsewhere")

In the simulations the baseline characteristics will be as-is in the dataset

Code
table1(~ age + factor(bm_low) + factor(meno) + size + factor(grade) + nodes + pgr + er + factor(sg_ran15) + factor(sg_ran20) + factor(sg_ran40) + factor(sg_ran60) + factor(country) | hormon, data=df.case)
0
(N=440)
1
(N=246)
Overall
(N=686)
age
Mean (SD) 51.1 (9.96) 56.6 (9.41) 53.1 (10.1)
Median [Min, Max] 50.0 [21.0, 80.0] 58.0 [32.0, 80.0] 53.0 [21.0, 80.0]
factor(bm_low)
0 373 (84.8%) 215 (87.4%) 588 (85.7%)
1 67 (15.2%) 31 (12.6%) 98 (14.3%)
factor(meno)
0 231 (52.5%) 59 (24.0%) 290 (42.3%)
1 209 (47.5%) 187 (76.0%) 396 (57.7%)
size
Mean (SD) 29.6 (14.4) 28.8 (14.1) 29.3 (14.3)
Median [Min, Max] 25.0 [3.00, 120] 25.0 [4.00, 100] 25.0 [3.00, 120]
factor(grade)
1 48 (10.9%) 33 (13.4%) 81 (11.8%)
2 281 (63.9%) 163 (66.3%) 444 (64.7%)
3 111 (25.2%) 50 (20.3%) 161 (23.5%)
nodes
Mean (SD) 4.94 (5.56) 5.13 (5.33) 5.01 (5.48)
Median [Min, Max] 3.00 [1.00, 51.0] 3.00 [1.00, 36.0] 3.00 [1.00, 51.0]
pgr
Mean (SD) 102 (170) 124 (250) 110 (202)
Median [Min, Max] 32.0 [0, 1600] 35.0 [0, 2380] 32.5 [0, 2380]
er
Mean (SD) 79.7 (124) 126 (191) 96.3 (153)
Median [Min, Max] 32.0 [0, 898] 46.0 [0, 1140] 36.0 [0, 1140]
factor(sg_ran15)
0 430 (97.7%) 241 (98.0%) 671 (97.8%)
1 10 (2.3%) 5 (2.0%) 15 (2.2%)
factor(sg_ran20)
0 428 (97.3%) 238 (96.7%) 666 (97.1%)
1 12 (2.7%) 8 (3.3%) 20 (2.9%)
factor(sg_ran40)
0 417 (94.8%) 229 (93.1%) 646 (94.2%)
1 23 (5.2%) 17 (6.9%) 40 (5.8%)
factor(sg_ran60)
0 401 (91.1%) 225 (91.5%) 626 (91.3%)
1 39 (8.9%) 21 (8.5%) 60 (8.7%)
factor(country)
Elsewhere 391 (88.9%) 226 (91.9%) 617 (89.9%)
Somewhere 49 (11.1%) 20 (8.1%) 69 (10.1%)

3.8 Underlying uniform treatment benefit of hr=0.6

Note: allows for turning off evaluation since run first and then compiling document (loading results below)

Code
# Note dgm needs to be generated to include df.case updates
# PH 
log.hrs <- log(c(0.6,0.6,0.6))
dgm <- get_dgm_stratified(df=df.case,zeta=10,knot=5,log.hrs=log.hrs,strata_tte=NULL,details=FALSE)

wname <- c("meno")
bw <- -log(5)

# Create subgroup definitions (ID's) and corresponding names

subgroups_id <- c("itt==\"Y\"", "bm_low==0", "bm_low==1", "meno==0", "meno==1", 
"country==\"Somewhere\"","country==\"Elsewhere\"","age<=65","age>65",
"size <=  median(size)","pgr <= median(pgr)", "er <= median(er)",
"grade==1","grade==2","grade==3",
"random15==1", "random20==1", "random40==1","random60==1")

subgroups_name <- c("All Patients", "BM low", "BM non-low", "Pre-meno", "Post-meno",
"Somewhere","Elsewhere", "Age<=65", "Age>65",
"med(size)","med(pgr)","med(er)","grade1","grade2","grade3",
"random15", "random20","random40", "random60")

# Check 1 simulation
#res <- get_SGanalyses(dgm, subgroups_name,subgroups_id, sims=1, wname, bw, Ndraw = nrow(dgm$df_super), bmcut=log(2),outfile=NULL)

# Run sims=k (eg, k=1000) simulations and output (per outfile)

res <- get_SGanalyses(dgm, subgroups_name,subgroups_id, sims=5000, wname, bw, Ndraw = nrow(dgm$df_super), bmcut=log(2),
outfile=c("results/bm1_bw=log5_v1.Rdata"))

Let’s look at operating characteristics

Code
options(scipen = 1, digits = 2)  
load("results/bm1_bw=log5_v1.Rdata")

Look at how many Cox model estimates are missing across analyses (here there are 5000 simulations)

Code
options(scipen = 1, digits = 2)  

apply(subgroup_hrs1,2,function(x){sum(!is.na(x))})
All Patients       BM low   BM non-low     Pre-meno    Post-meno    Somewhere 
        5000         5000         5000         5000         5000         5000 
   Elsewhere      Age<=65       Age>65    med(size)     med(pgr)      med(er) 
        5000         5000         4990         5000         5000         5000 
      grade1       grade2       grade3     random15     random20     random40 
        5000         5000         5000         4962         4993         5000 
    random60 
        5000 
Code
apply(subgroup_hrs2,2,function(x){sum(!is.na(x))})
All Patients       BM low   BM non-low     Pre-meno    Post-meno    Somewhere 
        5000         5000         5000         5000         5000         5000 
   Elsewhere      Age<=65       Age>65    med(size)     med(pgr)      med(er) 
        5000         5000         4993         5000         5000         5000 
      grade1       grade2       grade3     random15     random20     random40 
        5000         5000         5000         4979         4998         5000 
    random60 
        5000 
Code
apply(subgroup_hrs3,2,function(x){sum(!is.na(x))})
All Patients       BM low   BM non-low     Pre-meno    Post-meno    Somewhere 
        5000         5000         5000         5000         5000         5000 
   Elsewhere      Age<=65       Age>65    med(size)     med(pgr)      med(er) 
        5000         5000         4993         5000         5000         5000 
      grade1       grade2       grade3     random15     random20     random40 
        5000         5000         5000         4909         4987         5000 
    random60 
        5000 
Code
apply(subgroup_hrs5,2,function(x){sum(!is.na(x))})
All Patients       BM low   BM non-low     Pre-meno    Post-meno    Somewhere 
        5000         5000         5000         5000         5000         5000 
   Elsewhere      Age<=65       Age>65    med(size)     med(pgr)      med(er) 
        5000         5000         4990         5000         5000         5000 
      grade1       grade2       grade3     random15     random20     random40 
        5000         5000         5000         4816         4950         5000 
    random60 
        5000 
Code
apply(subgroup_hrs6,2,function(x){sum(!is.na(x))})
All Patients       BM low   BM non-low     Pre-meno    Post-meno    Somewhere 
        5000         5000         5000         5000         5000         5000 
   Elsewhere      Age<=65       Age>65    med(size)     med(pgr)      med(er) 
        5000         5000         4995         5000         5000         5000 
      grade1       grade2       grade3     random15     random20     random40 
        5000         5000         5000         4981         4997         5000 
    random60 
        5000 
Code
# Note that analysis 4 "hrs4" stratifies by biomarker low and thus "BM low" is redundant
apply(subgroup_hrs4,2,function(x){sum(!is.na(x))})
All Patients       BM low   BM non-low     Pre-meno    Post-meno    Somewhere 
        5000            0         5000         5000         5000         5000 
   Elsewhere      Age<=65       Age>65    med(size)     med(pgr)      med(er) 
        5000         5000         4992         5000         5000         5000 
      grade1       grade2       grade3     random15     random20     random40 
           0         5000         5000            0            0         5000 
    random60 
        5000 

3.8.1 ITT analyses: Look at performance for all six analyses getSG_dfhrSIX()

Code
options(scipen = 1, digits = 2)  

# Note that use of alpha=0.01 means we want to examine broad range of distribution of HR estimates
# Here alpha=0.01 does NOT represent CI for analysis --- simulated analyses are done at traditional 0.025

SG_table <- getSG_dfhrSIX(z=apply(subgroup_ns,2,mean), x1=subgroup_hrs1, x2=subgroup_hrs2, x3=subgroup_hrs3, x4=subgroup_hrs4, x5=subgroup_hrs5, x6=subgroup_hrs6,
ubx1=subgroup_ubs1, ubx2=subgroup_ubs2, ubx3=subgroup_ubs3, ubx4=subgroup_ubs4,
ubx5=subgroup_ubs5, ubx6=subgroup_ubs6, analysisx1="sR", analysisx2="none", analysisx3="sX", analysisx4="sBM*", analysisx5="sX+sR", analysisx6="X+O+sR",
which_sgs=c("All Patients"), alpha=0.01)


dt <- SG_table

dthi <- dt[,c("hi")]

which_inf <- which(!is.na(dthi) & dthi>1000) 
dt[which_inf,c("hi")] <- Inf

# indent the subgroup if there is a number in the est column
dt$Subgroup <- ifelse(is.na(dt$est),
dt$Subgroup, paste0("   ", dt$Subgroup))

names(dt)[names(dt) == "Subgroup"] <- "Subpopln"

# Add blank column for the forest plot to display CI.
# Adjust the column width with space.
dt$` ` <- paste(rep(" ", 20), collapse = " ")

# Create confidence interval column to display
dt$`HR (99% ECI)` <- ifelse(is.na(dt$se), "",
                             sprintf("%.2f (%.2f to %.2f)",
                                     dt$est, dt$low, dt$hi))

tm <- forest_theme(base_size = 10,
                   refline_gp = gpar(col = "red"),
                   footnote_gp = gpar(col = "#636363", fontface = "italic"))

# Adding HR<1
# Change ci_column from 4 to 5?

p <- forest(dt[,c(1:4, 9:10)],
            est = dt$est,
            lower = dt$low,
            upper = dt$hi,
            sizes = dt$se,
            ci_column = 5,
            ref_line = 0.8,
            arrow_lab = c("HR < 0.8", ">= 0.8"),
            xlim = c(0.25, 1.1),
            ticks_at = c(0.6, 0.80, 1.0),
            footnote = "HR estimates",
            theme = tm)

3.9 Subgroup analyses: Distribution of HR estimates across subgroups for standard analysis stratified by randomization factors (analysis=“sR”)

Note that which_sgs allows for choosing which subgroups to display

Look at all subgroups_name but for only 1 analysis “x1 = sR”

Code
options(scipen = 1, digits = 2)  

# Note that use of alpha=0.01 means we want to examine broad range of distribution of the upper bounds of HR estimates
# Here alpha=0.01 does NOT represent CI for analysis --- the upper bounds are 95% at analysis level

SG_table <- getSG_dfhrONE(z=apply(subgroup_ns,2,mean), x1=subgroup_hrs1, x2=subgroup_hrs2, x3=subgroup_hrs3,
ubx1=subgroup_ubs1, ubx2=subgroup_ubs2, ubx3=subgroup_ubs3, ubx4=subgroup_ubs4,
ubx5=subgroup_ubs5, ubx6=subgroup_ubs6, analysisx1="sR", 
which_sgs=subgroups_name, alpha=0.01)

dt <- SG_table

dthi <- dt[,c("hi")]

which_inf <- which(!is.na(dthi) & dthi>1000) 
dt[which_inf,c("hi")] <- Inf

# indent the subgroup if there is a number in the est column
dt$Subgroup <- ifelse(is.na(dt$est),
dt$Subgroup, paste0("   ", dt$Subgroup))

names(dt)[names(dt) == "Subgroup"] <- "Subpopln"

# Add blank column for the forest plot to display CI.
# Adjust the column width with space.
dt$` ` <- paste(rep(" ", 20), collapse = " ")

# Create confidence interval column to display
dt$`HR (99% ECI)` <- ifelse(is.na(dt$se), "",
                             sprintf("%.2f (%.2f to %.2f)",
                                     dt$est, dt$low, dt$hi))

tm <- forest_theme(base_size = 10,
                   refline_gp = gpar(col = "red"),
                   footnote_gp = gpar(col = "#636363", fontface = "italic"))

# Adding HR<1
# Change ci_column from 4 to 5?

p <- forest(dt[,c(1:4, 9:10)],
            est = dt$est,
            lower = dt$low,
            upper = dt$hi,
            sizes = dt$se/2,
            ci_column = 5,
            ref_line = 0.8,
            arrow_lab = c("HR < 0.8", ">= 0.8"),
            xlim = c(0.25, 1.1),
            ticks_at = c(0.6, 0.80, 1.0),
            title="Cox hazard-ratio estimates",
            footnote = "HR estimates",
            theme = tm)

3.10 Subgroup analyses: Distribution of upper bound (ub(HR)) of HR estimates across subgroups for standard analysis stratified by randomization factors (analysis=“sR”)

Note that which_sgs allows for choosing which subgroups to display

Look at all subgroups_name but for only 1 analysis “x1 = sR”

Code
options(scipen = 1, digits = 2)  

# Note that use of alpha=0.01 means we want to examine broad range of distribution of the upper bounds of HR estimates
# Here alpha=0.01 does NOT represent CI for analysis --- the upper bounds are 95% at analysis level

SG_table <- getSG_dfubONE(z=apply(subgroup_ns,2,mean), x1=subgroup_hrs1, x2=subgroup_hrs2, x3=subgroup_hrs3,
ubx1=subgroup_ubs1, ubx2=subgroup_ubs2, ubx3=subgroup_ubs3, ubx4=subgroup_ubs4,
ubx5=subgroup_ubs5, ubx6=subgroup_ubs6, analysisx1="sR", 
which_sgs=subgroups_name, alpha=0.01)

dt <- SG_table

dthi <- dt[,c("hi")]

which_inf <- which(!is.na(dthi) & dthi>1000) 
dt[which_inf,c("hi")] <- Inf

# indent the subgroup if there is a number in the est column
dt$Subgroup <- ifelse(is.na(dt$est),
dt$Subgroup, paste0("   ", dt$Subgroup))

names(dt)[names(dt) == "Subgroup"] <- "Subpopln"

# Add blank column for the forest plot to display CI.
# Adjust the column width with space.
dt$` ` <- paste(rep(" ", 20), collapse = " ")


dt$`UB(HR) (99% ECI)` <- ifelse(is.na(dt$se), "",
                             sprintf("%.2f (%.2f to %.2f)",
                                     dt$est, dt$low, dt$hi))


tm <- forest_theme(base_size = 10,
                   refline_gp = gpar(col = "red"),
                   footnote_gp = gpar(col = "#636363", fontface = "italic"))


p <- forest(dt[,c(1:4, 9:10)],
            est = dt$est,
            lower = dt$low,
            upper = dt$hi,
            sizes = dt$se/2,
            ci_column = 5,
            ref_line = 1,
            arrow_lab = c("Upper Bound < 1", ">= 1"),
            xlim = c(0.5, 6),
            ticks_at = c(1.0, 2, 3, 4, 5, 6),
            title="Upper-bound estimates (re: hr point estimates)",
            footnote = "UB(HR) estimates",
            theme = tm)

References

Aalen, O. O., R. J. Cook, and K. Røysland. 2015. “Does Cox Analysis of a Randomized Survival Study Yield a Causal Treatment Effect?” Lifetime Data Analysis, 579–93.